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FOREWORD 

The  design,  development,  fabrication,  and  test  of  a high-speed  low-power 
correlator  was  performed  by  TRW  Defense  and  Space  Systems  Group  in  Redondo  Beach, 
California.  This  work  was  sponsored  by  the  Naval  Electronics  System  Comment 
(L.W.  Sumney,  Code  3042)  and  was  directed  by  the  Naval  Research  Laboratory 
(0.  F.  Barbe,  Code  5260). 

The  principal  investigator  was  David  Breuer,  Manager  of  the  High  Speed 
Bipolar  Department  in  the  Microelectronics  Center  of  TRW  Defense  and  Space 
Systems  Group.  Principal  contributors  to  this  project  were  Albert  Cosand, 

Wayne  Current,  Diogenes  Cordero,  and  Alan  Tempi  in. 
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INTRODUCTION 


This  final  report  describes  the  work  done  during  the  fourth  phase 
of  the  High-Speed,  Low-Power  Correlator  Development  program. 

The  objective  of  the  fourth  phase  of  this  program  was  the  development 
of  a monolithic  32-bit  digital  parallel  correlator  with  an  analog  summed 
output,  and  a digitally  summed  output. 

The  correlator  functional  block  diagram  is  shown  in  Figure  1-1. 

Two  independently  clocked  32-bit  shift  registers  (A  and  B)  are  compared 
bit  by  bit  by  32  exclusive-OR  circuits.  Each  exclusive-OR  circuit: 

• Controls  a D/A  current  source;  the  output  currents  are  summed 

into  a common  node  to  produce  an  analog  output  correlation  function. 

• Provides  a digital  signal  to  the  32-bit  digital  summer;  the  output 
is  a binary-coded  digital  work  representing  the  sum  of  digits 
which  agree  at  any  one  time  between  the  two  shift  registers. 
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Figure  1-1.  Correlator  Block  Diagram 
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Both  shift  register  outputs  are  available  for  cascading  LSI  modules 
to  form  longer  word  lengths.  The  correlator  will  provide  both  analog  and 
digital  outputs;  the  user  has  the  choice  of  either  signal  form.  The 
digital  summing  circuit  will  receive  independent  power,  so  if  the  user  is 
interested  in  only  the  analog  output,  the  power  of  the  unused  digital 
circuitry  can  be  reduced  to  zero. 


The  basic  specification  goals  of  the  proposed  correlator  circuit 


include: 


• Signal  and  reference  data  ertered  into  two  independently 
clocked  shift  registers. 

• Analog  current  output  proportional  to  the  number  of  bits 
correlating  between  the  two  shift  registers. 

• Binary  coded  digital  output  which  represents  the  number  of 
bits  correlating  between  the  two  shift  registers. 

• An  independent  clock  control  on  the  digital  summing  circuit. 

t All  digital  interfaces  ECL  compatible. 

• Analog  output  current  of  0.2  mA  per  bit,  plus  offset  current, 
with  bit-to-bit  accuracies  <_  5%  with  respect  to  the  nominal  value. 

• Power  dissipation  of  480  mW. 

• Clock  rate  of  each  clock  input  of  150  MHz. 

• Digital  output  appears  not  longer  than  five  clock  periods 
after  the  correlation  measurement. 

• Temperature  range  of  -55°C  to  +125°C. 


2.  CORRELATOR  DEVELOPMENT 

The  function  of  the  digital  correlator  is  to  compute  the  number  of 
agreement  bits  between  two  binary  words  (usually  of  length  2 - 1). 

When  2n-bit  correlators  are  used,  the  outputs  from  2N/2n  = 2N‘n 
correlators  will  then  be  summed  together  as  the  overall  output. 

In  each  correlator,  2n  exclusive-NOR  gates  (comparators)  compare 
the  corresponding  bits  between  two  2 n - b i t binary  words  stored  in  the 
shift  registers.  Here  the  problem  of  summing  can  be  looked  at  in 
two  modes: 

• To  sum  2n  binary  bits  into  a (n+l)-b.  -d  inside  a correlator. 

N 

• To  combine  2N’n  n-bit  words  into  a (N+l>  d. 

Correlator  Digital  Summing 

The  function  of  the  digital  summer  is  to  sense  the  states  of  the  2n 
binary  comparators  and  to  generate  a (n+l)-bit  binary  number  corresponding 
i to  the  number  of  comparators  which  are  in  the  logic  1 states.  A pipeline 

approach  is  used  to  achieve  a maximum  operating  rate.  The  pipeline  summing 
is  illustrated  in  Figure  2-1  using  7 inputs  as  an  example.  Latched  3-input 
full  adders  are  the  basic  building  block  which  allows  the  summing  rate 
only  limited  by  a full  adder  delay. 

| '' 

It  can  be  shown  that,  in  general,  the  optimal  summing  circuit  (using 
minimum  number  of  full  adders)  is  to  compute  the  sum  of  2n-l  inputs.  The 
minimum  number  of  full  adders,  Sn,  to  compute  the  sum  of  2n-l  inputs  can 
be  expressed  in  an  equation  as  follows: 


Sn  = 2n-l -n 
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n > 2 
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Figure  2-1.  A 7-Bit  Pipeline  Summing  Circuit 


Figure  2-2  shows  an  optimal  digital  31-input  summing  circuit.  It  uses 
26  latched  full  adders  and  16  latches,  and  provides  a 5-bit  "skewed"  binary 
number  as  its  output.  The  skewed  output  means  that  the  output  is  available 
after  5 clpck  periods  with  the  least-significant-bit  (LSB)  available  at  the 
fourth  clock  period  after  the  31  inputs  are  applied  to  the  digital  summing 
circuit,  the  second  LSB  available  at  the  end  of  the  fifth  clock  period,  and 
so  forth.  Providing  the  skewed  output  will  simplify  the  hardware  required 
to  expand  the  correlator  for  processing  of  words  of  long  lengths. 

For  a 32-bit  summer,  an  additional  5 half  adders  will  be  needed  to 
expand  the  31-bit  summing  circuit  to  provide  the  6-bit  binary  output.  This 
add-one  circuitry  can  be  implemented  on-chip  or  using  external  logic  circuits. 
However,  to  operate  the  correlator  as  a self-contained  32-bit  correlator, 
it  is  recommended  that  the  add-one  circuitry  should  be  built  on-chip. 

Figure  2-3  presents  the  complete  digital  summing  circuit  for  the  32-bit 
correlator.  The  add-one  circuit  uses  full  adders  instead  of  half  adders 
at  a slight  increase  to  circuit  complexity,  but  provides  the  capability 
of  combining  the  5-bit  binary  number  S^S^S^S^S^,  representing  a number 
up  to  31,  the  32—  correlator  output,  and  an  external  5-bit  binary  number 
I1I2I3I4I5,  representing  a number  up  to  31  into  a 6-bit  binary  output, 
Ti^2^3^4^5^6’  rePreser|ting  a maximum  count  of  63.  This  allows  an  implemen- 
tation of  a 63-bit  correlator  with  two  32-bit  correlator  modules. 

In  addition  to  the  six  bit  skewed  output,  T]T2T3T4T5T6’  a synchronous 
sum  is  available  on  pins  SS^S^SS^SS^SS^SSg.  The  availability  of  the 
synchronous  sum  is  a matter  of  convenience,  requiring  the  addition  of  only 
10  latches  to  the  circuit. 

Implemenation  for  Correlators  of  Very  Long  Length 

The  correlator  design  with  the  digital  summing  circuit  shown  in 

Figure  2-3,  can  be  used  to  implement  correlators  of  very  long  word  length; 

N N 

say  2 -1  or  2 , for  large  N’s.  The  number  of  32-bit  correlator  modules  is 
N N-  5 

computed  by  2 s 32  = 2 . For  a 63-bit  correlator  where  N = 6,  two 

correlators  will  be  used  without  any  additional  circuits  required  as  shown 
in  Figure  2-4.  Here  both  skewed  and  synchronous  (non-skewed)  outputs  are 
available  for  further  signal  processing.  The  synchronous  output  can  be 
used  directly  for  functions  such  as  correlation  threshold  detection.  The 
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Figure  2-4.  Interconnection  of  Two  32-Bit  Summers  to  Implement  a 63-Bit  Summer 


skewed  output  can  be  used  for  further  expanding  the  63  ( 64 ) -bi t correlators 
to  implement  correlators  of  very  long  word  lengths.  Figure  2-5  shows  the 
implementation  of  a 127-bit  correlator.  Note  that  the  "extra"  delay  incurred 

¥ Vi 

in  the  generation  of  the  64  --  correlator  output  causes  it  to  be  available 
on  the  clock  phase  when  it  is  required,  which  avoids  the  need  to  add  a latch 
to  delay  it.  Any  (2  - 1 ) -bi t correlator  can  be  implemented  in  a similar  way 
and  the  external  hardware  to  combine  the  63  (64 ) -hi t correlator  outputs 
into  a N-bit  skewed  output,  can  be  estimated  by: 

N-l 

A.  Number  of  latched  full  adders*  * 1*2N'^'' 

1=6 

B.  Number  of  latches  * 2^'^  - 1 

If  synchronous  outputs  are  required,  the  hardware  in  addition  to  the  above  is: 

N-2 

C.  Number  of  latches  * r.  J 

J = 1 

D.  Number  of  output  latches  = N 

Table  1 presents  the  hardware  requirements  for  the  implementation  of 
N 

(2  - 1 ) - b i t correlators;  for  n=5  to  10.  This  summing  technique  can  be 
expanded  by  any  large  N. 

Need  of  a Summer  LSI 

Note  that  this  external  summing  circuit  is  quite  repetitive  in  nature. 

The  overall  parts  count  can  be  greatly  reduced  if  another  LSI  circuit  is 
developed  to  perform  the  external  summing  for  implementing  correlators 
of  very  long  word  lengths.  As  an  example,  consider  the  "skewed"  8-bit 
latched  full  adder  as  shown  in  Figure  2-6.  If  this  device  is  available, 
the  item  A in  Table  1 can  be  simplified  to  as  follows: 


*A  full  adder  followed  by  a latch. 
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Figure  2-5.  A 127-Bit  Digital  Correlator 


Figure  2-6.  A "Skewed"  8-Bit  Adder 


TABLE  1.  HARDWARE  SUMMARY  OF  (2  -U-BIT  CORRELATOR 





Number  of  bits 


= 2N-1 

31 

63 

127 

255 

511 

1,023 

A.  Full  Adders 
(with  outputs 
latched) . 

0 

0 

6 

(5  IC's) 

19 

(15  IC's) 

OQ 

(29  IC's) 

101 

(76  IC's) 

Replace  by 
skewed  8-bit 
full  adders 

0 

0 

1 

3 

7 

15 

It  is  obvious  that  the  summing  circuit  is  greatly  simplified  for  large  N's. 

LSI  Implementation  of  the  32-DOC- 1 

Electrical  design  options  have  been  implemented  which  allow  either 
low-level  differential  signals  to  be  output  as  signals  S-j-S^  and  S-j-S^  or 
single-ended,  ECL  compatible  signals  output  as  signals  The 

power  supply  for  the  ECL  compatible  option  is  -6.0  volts,  and  the  estimated 
chip  power  is  approximately  0.8  watts.  Since  the  S output  signals  interface 
only  with  another  32-DOC- 1 chip,  the  full  ECL  signal  swing  may  be  unnecessary. 
By  making  use  of  the  low-level  differential  logic  swing  option,  the  power 
supply  voltage  may  be  reduced  to  -5.2  volts,  resulting  in  an  estimated  chip 
power  of  0.7  watts.  In  either  form,  the  projected  clocking  rate  is  150  MHz. 

The  layout  organization  of  the  32-D0C-1  chip  is  shown  in  Figure  2-7. 

On  the  right  are  the  clock  and  voltage  reference  circuits.  Eighteen  bits 
of  correlation  are  placed  on  the  top  of  the  die,  and  fourteen  on  the  bottom. 

The  digital  summer  occupies  the  center  and  left  portions  of  the  die.  The 
signal  flow  of  the  32-DOC  chip  is  also  shown. 

The  32-DOC- 1 chip  contains  approximately  3350  transistors  and  1120 
resistors.  The  die  size  is  238  mils  by  173  mils. 

Circuit  Design 

The  thirty-two  bit  digital  output  correlator,  32-D0C-1,  is  designed 
using  low-level  differential  logic  implementation.  One  bit  of  correlation, 
as  implemented  in  the  32-D0C,  is  shown  in  Figure  2-8.  The  shift  registers 
are  straightforward  implementations  of  differential  logic,  except  level -shifted 
for  interfacing  with  the  adder  tree.  The  left-hand  shift  register  is 
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Figure  2-8 


j 

reference  -VBE  with  respect  to  the  right  hand  shift  register.  This 
facilitates  driving  the  exclusive-OR  without  level  shifters  in  the  data 
lines.  This  approach  offers  high  speed  performance,  lower  power,  and  lower 
parts  count  than  the  more  conventional  approach  of  utilizing  level-shifters 
in  the  data  lines.  The  input  end-bit  and  output  end-bit  are  shown  in 
Figures  2-9  and  2-10,  respectively.  The  input  end  bit  is  nearly  identical 
to  the  middle  bits  with  the  exception  that  an  ECL  threshold  generator  is 
included,  so  that  the  input  bit  can  accept  data  in  the  form  of  a single- 
ended  ECL  input  signal  with  normal  ECL  levels.  The  output  bit  contains 
a single  master  latch  and  two  parallel  slave  latches  in  both  of  the  data 
registers.  One  of  the  slave  latches  provides  the  low  level  differential 
signal  required  by  the  exclusive-OR  to  perform  the  correlation;  the  other 
operates  with  a standard  ECL  swing  and  drives  the  shift  register  output. 

This  ensures  that  the  correlation  operation  will  be  independent  of  output 
loading,  and  simplifies  the  correlator. 

Figure  2-11  shows  the  complete  summing  circuit  for  the  32-bit  correlator. 
♦ The  principal  building  block  of  the  digital  summer  is  the  latched  full 

adder.  Several  versions  of  this  circuit  are  used,  shown  in  Figures  2-12 
through  2-15.  These  differ  mainly  in  the  number  of  input  level  shifters 
that  are  required  to  interface  to  the  particular  set  of  inputs  supplied 
to  a given  adder,  and  in  the  options  for  output  voltage  swing  in  Figure  2-14. 
The  circuit  operation  is  most  easily  seen  in  Figure  2-12,  which  has  the  least 
level  shifting  circuitry.  There  are  two  independent  latched  gates;  one 
generates  the  sum  and  the  other  the  carry.  In  the  circuit  as  drawn,  the 
latches  are  configured  as  master  latches;  they  may  be  operated  as  slaves 
by  interchanging  the  0 and  $ lines.  The  summer  also  utilizes  simple 
latches  to  provide  delays  as  required  in  the  summing  tree  and  to  deskew 
the  outputs.  Five  versions  of  the  latches,  differing  in  input  and/or 
output  level  shifting  or  output  voltage  swing,  are  utilized  in  various 
places  in  the  summer.  The  schematics  of  the  five  latch  types  are  shown 
in  Figures  2-16  through  2-20.  The  specific  type  of  adder-latch  or  delay 
latch  used  at  each  position  in  the  summer  is  listed  in  the  summer  block 
diagram  of  Figure  2-11. 
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Full 


Figure  2-13 
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The  reference  voltages  which  control  the  current  sources  in  the  digital 
circuitry  are  shown  in  Figures  2-11  and  2-22.  One  controls  the  shift 
registers  and  exclusive-OR  gates  of  the  correlator  section,  and  the  other 
controls  the  sunnier.  The  reference  generator  for  the  sunnier  normally  has 
the  Input  labelled  "$c"  tied  to  ground.  The  sunnier  can  be  turned  off  by 
disconnecting  this  pad  if  only  the  analog  correlator  function  Is  required. 
The  two  circuits  are  essentially  identical  In  operation.  They  consist  of 
a bandgap  regulator  cell  to  generate  temperature- independent  voltage  which 
is  then  used  to  generate  a voltage  to  drive  the  current  sources  to  provide 
constant  current.  There  is  also  circuitry  to  compensate  the  output  against 
power  supply  variations. 

The  analog  reference  circuit  In  Figure  2-23  drives  the  current  sources 
that  are  summed  into  the  analog  correlator  output.  An  external  resistor 
Is  used  to  set  the  full  scale  current  output.  This  circuit  also  corrects 
for  variations  in  transistor  gain  that  would  otherwise  cause  the  output 
current  to  vary. 

The  clock  buffers  in  Figures  2-24  and  2-25  provide  correct  levels  to 
operate  the  shift  registers  and  sunnier.  When  the  sunnier  Is  configured  to 
provide  ECL  output  levels,  an  optional  additional  level  shift  is  included 
In  the  sunnier  clock  buffer  to  avoid  saturating  the  clock  transistors  in 
the  sunnier. 

A photograph  of  the  packaged  32-D0C-1  circuit  is  shown  in  Figure  2-25. 
The  chip  shown  is  configured  for  ECL  outputs  from  the  sunnier.  It  is 
packaged  in  a standard  5/8"  diameter  round  flatpack  with  40  leads.  Leads 
are  on  .050"  centers. 
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3.  TEST  PROCEDURES  AND  RESULTS 


The  functional  block  diagram  ot  the  32-bit  digital  ouip‘jf  cc  relator 
is  shown  in  Figure  3-1.  Two  independently  clocked  32-bit  riuU  registers 
are  compared  bit  by  bit  by  32  exclusive-OR  circuits.  Each  exc:usive-OP 
circuit  controls  a D/A  current  source,  the  output  currents  of  which  are 
summed  into  a common  node  to  produce  an  analog  output  correlator 
function.  The  exclusive-OR  circuit  also  provides  a digital  signal  to  the 
32-bit  digital  summer  which  produces  a binary-coded  digital  word  representing 
the  sum  of  digits  which  agree  at  any  one  time  between  the  two  shift  registers. 

A simple  test  was  designed  to  test  for  functionality  of  the  device, 
both  at  wafer  probe  and  in  packaged  form.  A 32-bit  word  is  shifted  through 
one  shift  register,  while  the  other  is  held  static.  The  analog  output  of 
a properly  functioning  circuit  is  a 32-bit  staircase  function.  The  test 
set  is  arranged  so  that  the  conditions  can  be  reversed  to  facilitate 
looking  at  both  shift  register  outputs. 

A simple  circuit  for  generating  the  test  word  is  shown  in  Figure  3-2. 

This  circuit  produces  inputs  for  shift  registers  and  Qg,  as  well  as 
a scope  sync,  pulse  and  clock. 

Figure  3-3  shows  the  analog  output  of  a functional  32-D0C-1  chip. 

The  upper  trace  shows  the  output  of  shift  register  A,  with  shift  register  B 
held  static.  The  output  from  the  circuit  with  the  input  conditions  reversed 
is  shown  in  Figure  3-4. 

A convenient  way  to  probe  test  for  32-D0C-1  circuits  with  properly 
functioning  digital  outputs  is  to  sum  the  digital  outputs  into  a D/A 
converter,  and  then  look  at  the  reconstructed  output.  Figure  3-5  shows 
a block  diagram  to  take  the  digital  outputs  from  the  device  under  test, 
sum  them  into  a D/A  converter,  and  make  the  output  from  the  D/A  available 
for  monitoring  on  a scope.  The  output  from  a functional  device  should  be 
a 32-bit  staircase  function. 

In  order  to  fully  evaluate  a working  device,  the  bench  tester  was 
designed  so  that  the  digital  outptus  can  be  monitored  individually. 
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Figure  3*1 . 


SR  Output 


Analog  Output 


Figure  3-3.  32-DOC-1  Analog  Output 


SR  Output 


Analog  Output 


Figure  3-4.  32-DOC  Analog  Output 
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The  test  is  essentially  the  same  as  that  used  for  probe  testing.  Provision 
is  made  for  shifting  a test  word  into  either  shift  register,  QA  or  Qg,  while 
holding  the  other  static.  Each  digital  output  SSj  to  SSg  may  then  be  looked 
at  individually.  A timing  diagram  showing  the  relationship  of  the  analog 
output  to  the  digital  outputs  is  shown  in  Figure  3-6. 

Figure  3-7  shows  an  expanded  analog  output  showing  the  relative 
analog  accuracy  and  linearity  of  the  device,  with  errors  less  than  + 5.0%. 

The  circuits,  as  tested,  have  an  error  in  the  analog  output.  An  error 

in  the  isolation  mask  sets  the  current  in  bit  32  to  be  the  same  as  bit  31. 

This  results  in  an  analog  output  with  31  apparent  steps.  This  error  requires 

one  mask  change  to  correct  it.  The  digital  output  functions  correctly. 

2 5 

Digital  outputs,  2 through  2 , referenced  to  the  analog  output,  are  shown 
in  Figure  3-8.  These  may  be  compared  with  the  timing  diagram  shown  in 
Figure  3-8.  The  digital  outputs,  2°,  2\  22,  23,  24,  and  2^  are  shown  in 
Figure  3-9. 

i The  correlator  circuits  have  been  tested  at  high  speed.  For  the 

high-speed  test,  a single  bit  was  shifted  into  the  input  Q^,  and  the 
clock  rate  increased  until  the  unit  failed  to  operate  correctly. 

Figure  3-9  shows  a typical  unit  operating  at  125  MHz.  Both  shift 
registers  operate  at  the  rated  speed.  The  chip  power  level  is  840  mw. 

A summary  of  the  test  results  from  the  chips  tested  is  shown  below  in 
Table  3-1 . 


Table  3-1. 

Power  Supply, 

VE£:  -6.0  Volts 

Unit 

Power 

mw 

Clock  Rate 
MHz 

10-115-3-6 

804 

125 

10-115-7-7 

894 

125 

13-146-6-6 

8C4 

125 

13-146-8-6 

876 

125 

13-146-10-4 

870 

125 

13-146-10-5 

828 

125 

The  bonding  diagram  showing  the  pin  connections  is  shown  in  Figure  3-10. 
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10125 


Figure  3-5.  32-DOC  Probe  Tester 


Timing  Diagram 
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Figure  3-8.  32-DOC  Analog-to-Digi tal  Outputs 


Figure  2-23. 
32 


Figure  3-10.  32-OOC  High  Frequency  Shift  Register  Output 
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Figure  3-11.  32-DOC  Bonding  Diagram 


4.  MONOLITHIC  PROCESS 


The  OAT  (Oxide  Aligned  Transistor)  technology  was  developed  by  TRW 
to  meet  the  demands  of  a high  speed,  low  power  bipolar  LSI  technology. 

The  basic  OAT  process  has  been  in  use  at  TRW  since  1971,  and  was  used 
in  this  development  to  meet  the  demands  of  producing  a low  power,  digital 
output  correlator  which  operates  at  150  MHz. 

BASIC  OAT  PROCESS  DESCRIPTION 

The  OAT  process  was  designed  to  give  microwave  transistor  devices  at 
LSI  yields.  Since  the  small  goemtries  needed  for  high  frequency  devices 
are  opposed  to  the  LSI  complexity  level  objectives,  special  techniques  are 
employed  to  achieve  both  simultaneously.  These  include  the  utilization 
of  oxide  wells  for  pseudo  self-aligning  and  a polycrystalline  arsenic 
emitter  for  higher  yield  of  shallow  base  width  transistors. 

Oxide  Well 

In  industry  standard  practice,  each  diffusion  step  is  masked  by  a 
separate  and  independent  photoresist-etch  step.  Each  mask  must  be  aligned 
very  precisely  to  maintain  minimum  geometry  construction.  The  limitations 
on  minimum  device  dimensions  are  consequently  a function  of  best  routine 
alignment  capability  and  worst-case  mask  distortions,  such  as  run-out. 

OAT  minimizes  these  problems  by  using  a thick  oxide  well,  as  shown  in 
Figure  4-1,  to  prealign  subsequent  diffusions.  In  the  case  shown,  one 
mask  prealigns  three  diffusions.  Take  for  instance,  the  isolation  diffusion 
the  isolation  mask  must  be  aligned  anywhere  within  the  region  shown  in 
Figure  4-2  to  achieve  "perfect  alignment",  etc.,  for  the  remaining  two 
diffusions. 

Two  well  structures  are  used  in  OAT.  A deep  well  prealigns  the 
isolation  diffusion,  deep  collector  N+  diffusion,  and  the  base  diffusion. 

A shallow  well  prealigns  the  base  contacts,  the  base  enhancement  diffusion, 
and  the  emitter.  The  oxide  well  structure,  therefore,  provides: 
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Figure  4-1.  OAT  Deep  Well  Structure 


ISOLATION  MASK 


AVAILABLE  TOLERANCES 

PRE-AUGNED 
OXIDE  STRUCTURE 


Figure  4-2.  Alignment  of  the  Isolation  Mask 


• Smaller  device  dimensions  and  lower  junction  capacitance  due 
to  limited  lateral  diffusion. 

• Smaller  device  dimensions  for  a given  mask  and  alignment 
capability  (providing  higher  complexity  LSI). 

• Lower  junction  capacitance  due  to  the  oxide  side  walls. 

• Smaller  base-collector  junction  area  since  base  contacts 
can  be  placed  at  the  edge  of  the  base. 

• Improved  radiation  resistance  due  to  the  reduced  PN  junction  area. 

Polycrystalline  Arsenic  (PA)  Emitter  Source 

The  industry  standard  for  the  emitter-base  structure  of  microwave 
devices  is  a washed  emitter  in  which  the  emitter  diffusion  and  the  emitter 
contact  occur  in  the  same  oxide  cut,  thus  producing  a minimum  emitter-base 
junction  area.  The  basic  problem  with  this  technique  is  that  the  etch 
dip  which  is  needed  to  remove  the  SiO?,  formed  during  the  emitter  diffusion, 
also  etches  in  a lateral  direction.  This  enhances  the  incipient 
emitter-base  short  which  exists  when  the  contact  metal  is  evaporated 
and  alloyed,  since  this  approach  relies  only  on  lateral  diffusion  of  the 
emitter  for  protection  and  passivation  of  the  emitter-base  junction. 

This,  in  turn,  reduces  yield  and  high  temperature  reliability.  This 
standard  critical  process  step  is  eliminated  in  OAT  by  using  an  arsenic 
doped  polycrystalline  emitter  doping  source,  as  shown  in  Figure  4-3. 


Figure  4-3.  Polycrystalline  Arsenic  Emitter 
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A heavily  arsenic  doped  polycrystalline  film  is  deposited  and  patterned. 
The  arsenic  is  driven  in  to  form  the  emitter  reg  The  doped  poly  is  then 
covered  with  metal  and  forms  the  ohmic  contact  bet;  ■>n  the  active  emitter 
region  and  the  metal  contact.  Thus,  the  metal  system  never  makes  contact 
with  the  silicon  and  is  separated  from  it  by  the  thickness  of  the  poly. 
Emitter-base  leakages  and  shorts  are  substantially  reduced  by  this  approach. 

Semiconductor  Processing 

The  processing  sequence  is  listed  in  Table  4-1  and  many  of  the  steps  are 
illustrated  in  Figure  4-4.  The  P-type  substrate  is  oxidized,  coated  with 
photoresist,  and  the  buried  layer  mask  is  exposed  in  the  photoresist.  The 
developed  photoresist  serves  as  a mask  to  allow  the  oxide  to  be  etched 
away  in  the  position  of  the  N+  buried  layer  diffusion;  the  diffusion  is 
carried  out  using  the  remaining  oxide  as  a mask  to  produce  the  structure 
shown  in  4-4b.  The  oxide  mask  is  then  etched  off,  and  an  N-type  epitaxial 
layer  is  grown.  The  next  step  in  the  OAT  process  is  to  grow  a thin  oxide, 

o o 

typically  200  A,  and  deposit  1000  A of  silicon  nitride  on  top  of  this. 

This  is  shown  in  Figure  4-4c. 

In  subsequent  processing,  use  is  made  of  the  fact  that  silicon  nitride 
and  Si02  can  be  independently  etched.  Phosphoric  acid,  which  is  used  to 
etch  silicon  nitride,  attacks  Si02  very  slowly,  and  the  buffered  HF 
solution  used  to  etch  Si02  exhibits  almost  no  attack  on  silicon  nitride. 
Photoresist  cannot  be  used  directly  to  etch  a pattern  into  silicon  nitride, 
since  it  is  attacked  by  hot  phosphoric  acid,  but  patterns  can  be  etched 
indirectly  by  growing  or  depositing  an  oxide  layer  on  the  nitride,  etching 
a pattern  in  the  oxide  with  photoresist  as  a mask,  then  etching  the  nitride 
with  the  oxide  pattern  as  a mask. 

The  oxide  well  pattern  is  etched  into  the  silicon  nitride  layer  with 
the  above  procedure,  and  then  about  0.4vim  of  silicon  is  etched  away  using 
the  nitride  as  a mask,  as  shown  in  Figure  4-4d.  Next,  about  1pm  of  oxide 
is  grown.  The  growth  of  this  oxide  consumes  about  0.45pm  of  silicon,  with 
the  result  that  the  oxide  is  recessed  into  the  silicon  wafer  to  produce  the 
deep  oxide  wells  shown  in  Figure  4-4e.  Next,  the  isolation  mask  is  used  to 
photoetch  the  silicon  nitride  from  the  region  of  the  isolation  diffusion 
(note  that  the  alignment  of  this  mask  is  not  critical,  as  it  can  overlap 
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Table  4-1.  OAT  Process  Sequence 


OXIDIZE 

PR*  BURIED  LAYER 

BURIED  LAYER  DIFFUSION 

(FIGURE  4-4B) 

STRIP  OXIDE 

EPI  DEPOSITION 

NITRIDE  DEPOSITION 

(FIGURE  4-4C) 

PR  OXIDE  WELL 

SILICON  ETCH 

(FIGURE  4-4DI 

OXIDATION 

(FIGURE  4-4E) 

PR  ISOLATION 

ISOLATION  DIFFUSION 

OXIDATION 

(FIGURE  4-4F) 

PR  DEEP  COLLECTOR 

DEEP  COLLECTOR  DIFFUSION 

(FIGURE  4-4G) 

STRIP  NITRIDE 

NITRIDE  DEPOSITION 

PR  CONTACT 

(FIGURE  4-4H) 

P+  BASE  DIFFUSION 

OXIDATION 

(FIGURE  4-41) 

STRIP  NITRIDE 

IMPLANT  ACTIVE  BASE 

NITRIDE  DEPOSITION 

BASE  ANNEAL/DRIVE-IN 

PR  EMITTER 

(FIGURE  4-4J) 

DOPED  POLY  DEPOSITION 

EMITTER  DIFFUSION 

PR  EMITTER  POLY 

(FIGURE  4-4KI 

STRIP  NITRIDE 

(FIGURE  4-4LI 

PR  DENOTES  THE  ENTIRE  PHOTOETCH  PROCESS.  INCLUDING 
COATING  WITH  PHOTORESIST,  EXPOSURE  OF  THE  MASK.  DEVEL- 
OPMENT AND  ETCHING. 


DM 


a) 

Mr  nuN&iSTo*  unour 


OKIOI/t  1 

p*  au**D  LJtrt*i 


b) 

•URICO  LAYER  DIFFUSION 


I STRIP  OXIOi 

[f  Pi  Of  POSIT  ION 


[P*  Of  CP  COL  Lie  ton] 

g) 

DEEP  COLLECTOR  DIFFUSION 


strip  mrmoc 
Ntrmoe  of  posit  ion 


h) 

PR  CONTACT 


m 

, 

r-  — * J 

" > 

Nirmoe  dcpositiom 

[fw  ox  tot  mu 


d) 

silicon  itch 


P*  RASE  DIFFUSION 
ANO  OXIDATION 


I STRIP  NlTRgJf 
IMPLANT  ACTlVt  BASl  i 
NIT  RIOC  Of  POSITION  1 
annfal  IMPLANT  I 


p*  * «»i. ... 


fill 


e) 

OXIDATION 


[PR  iSOL  4F/ON] 


ISOLATION  DIFFUSION 
A NO  OXIDATION 


j) 

PR  EMITTER 


[ OOPtD  POL  Y Of  POSITION 
[ fMlTTfR  Of  FUSION 


k) 

HR  EMITTER  POLY 


• ESC 


1) 

STRIP  NITRIDE 


Figure  4-4.  OAT  Transistor  Layout 


onto  the  deep  oxide  well).  Boron  is  diffused  into  the  open  region  and  then 
a fairly  thick  oxide  is  grown  to  seal  off  the  isolation  region.  This  is 
shown  in  Figure  4-4f.  Next,  the  deep  collector  mask  is  used  to  photoetch 
the  silicon  nitride  from  the  deep  collector  window,  and  phosphorous  is 
diffused  into  the  exposed  silicon  as  shown  in  Figure  4-4g. 

At  this  point  in  the  process,  the  silicon  nitride  covering  the  base  region 
is  stripped  off  and  a new  silicon  nitride  layer  deposited  so  that  the  entire 
wafer  is  covered  with  a uniform  thickness  of  nitride.  The  contact  mask  is 
used  to  pattern  this  nitride  layer  as  shown  in  Figure  4-4h.  This  leaves 
silicon  nitride  every  place  where  contact  will  subsequently  be  made  to  the 
silicon.  Boron  is  now  diffused  into  the  wafer.  The  boron  concentration  is 
lower  than  the  phosphorous  concentration  in  the  deep  collector  region,  so 
no  junction  is  formed  there,  but  a P+  base  region  is  formed  in  the  epitaxial 
layer  between  the  position  of  the  base  contacts  and  the  emitter.  The 
diffused  resistors  are  also  formed  at  this  step.  An  oxide  is  then  grown, 
giving  the  structure  shown  in  Figure  4-4j.  The  nitride  is  then  removed 
everywhere,  and  the  active  base  dopant  is  ion  implanted.  This  step 
provides  a precise  boron  doping  in  the  base  contact  regions.  Another  layer 
of  nitride  is  deposited  and  the  emitter  mask  is  used  to  open  the  regions 
(emitter  and  collector)  to  which  N type  contacts  will  be  made.  The  doped 
polycrystalline  silicon  layer  is  deposited  and  the  emitter  is  diffused 
from  the  doped  poly.  A reversed  polarity  copy  of  the  emitter  mask  is  used 
to  pattern  the  poly.  It  is  left  in  place  over  the  emitter  and  collector 
contacts  and  etched  away  everywhere  else  on  the  wafer.  This  is  shown  in 
Figure  4-4k.  The  semiconductor  fabrication  is  completed  by  stripping  off 
the  remaining  silicon  nitride  to  open  the  base  contacts  as  shown  in 
Figure  4-41.  The  wafer  is  now  ready  for  surface  processing,  which  will 
include  multilayer  metallization. 

Passive  Devices 

Metal -1 

o 

The  first  metal  layer  is  a titanium-aluminum  layer  with  1500  A and 

O 

5000  A aluminum.  This  thickness  is  chosen  to  provide  sufficient  coverage 
of  semiconductor  device  steps  and  to  keep  at  a minimum  step  heights 
created  by  metal-1.  The  metals  are  serially  evaporated  using  an  electron 
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beam  gun.  Deposition  pressure  is  approximately  10‘6  torr  and  substrate 
temperature  is  300°C.  Deposition  rate  is  approximately  50  A per  second. 

Dielectric  Deposition 

The  dielectric  layer  is  deposited  to  a thickness  of  7000  A,  using 
standard  silane  vapor  deposition  techniques.  Substrate  temperature  is 
400°C,  and  the  deposition  rate  is  approximately  200  A per  minute. 

Metal -2 

The  second  level  of  metal  is  an  RF  sputtered  aluminum  layer.  The 
layer  thickness  is  12,000  A. 

Passivation 

o 

The  passivation  layer  is  a 3000  A silane  vapor  deposited  layer. 

See  Table  4-2  for  a summary  of  the  surface  process  parameters. 

Table  4-2.  Summary  of  Surface  Processing  Parameters 


Process  Step  Temperature 


Layer 

Pressure  Resistivity  Thickness 


Metal -1 
Dielectric 
Metal -2 
Passivation 


300°C 

400°C 

200°C 

400°C 


10"8  torr 


'0.045  0/ 


6 x 10"3  torr  '0.035  n/ 


6,500  A 

7.000  A 

12.000  A 

3.000  A 


Device  Characteristics 

The  physical  dimensions  and  electrical  characteristics  for  a typical 
OAT  device  are  shown  in  Tables  4-3  and  4-4,  respectively.  The  devices  are 
scaled  for  peak  operating  performance  depending  on  their  use  within  the 
circuit.  Figure  4-5  shows  the  f^  characteristics  for  a device  optimized  to 
operate  in  the  5-10  mA  range. 
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Table  4-3.  Physical  Dimensions 


Transistor: 

Emitter  Width  3pm 

Base  Contact  Width  4pm 

Metal -to-Metal  Spacing  2pm 

Minimum  Transistor  Size  44pm  x 55pm 


Electrical  Characteristics 


Table  4-4. 

Transistor: 
f j = 4 to  5 GHz 
CC0  = °*14  Pf 
Ceo  = 0.08  pf 
C^s  = 0.16  pf  at  -3  volts 

Ic  = 2 mA 

r'  = 60  ohms 
c 

Resistor: 

R«-  = 240  ohms  per  square 


(CE  cutoff  frequency) 

(Zero  voltage  collector 
base  capacitance) 

(Zero  voltage  emitter- 
base  capacitance) 

(Collector-substrate 

capacitance) 

(Collector  current) 

(Collector  series  bulk 
resistance) 


(Sheet  resistance) 
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5.  CONCLUSIONS 


The  primary  objectives  of  developing  an  LSI  high-speed,  low-power 
digital  output  correlator  have  been  met.  The  prime  objective  during 
this  phase  of  the  program  was  to  develop  a 32-bit  digital  parallel  cor- 
relator with  an  analog  summed  output,  and  a digitally  summed  output. 

Specifically,  the  program  accomplished  the  following: 

• Demonstration  of  a 32-bit  digital  output  correlator  capable  of 
operating  at  clock  rates  greater  than  125  MHz. 

• A 32-bit  digital  output  correlator  which  dissipates  800mW  when 
biased  for  ECL  compatibility. 

• Composite  delay  x power  product  of  approximately  1.0  pi  cojoule/gate. 

• Analog  bit- to-bit  accuracy  of  < 5*. 

The  main  problem  experienced,  however,  has  been  the  high  degree  of 
circuit  complexity  and  associated  large  die  size.  This  has  resulted  in 
extremely  low  production  yields. 

Some  recommendations  for  making  this  a more  producable  circuit 

are: 

• Reduce  the  correlator  length  from  32  bits  to  31  bits. 

• Remove  the  option  for  providing  low-level  differential  outputs, 
as  well  as  ECL  outputs. 

• Provide  either  the  skewed  outputs  or  the  de-skewed  outputs,  but 
not  an  option  for  both. 

The  current  chip  size  is  238  mils  by  173  mils.  This  reuslts  in  very 
low  yield  for  a process  as  complex  at  the  OAT  process.  These  recommended 
changes  would  eliminate  16  latches  and  five  full  adders,  and  would  simplify 
the  interconnect.  This  would  reduce  the  chip  size  by  approximately  20*. 
thus  reducing  chip  power  and  enhancing  producability. 
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